{
  "nbformat": 4,
  "nbformat_minor": 0,
  "metadata": {
    "colab": {
      "name": "Pandas DataFrame UltraQuick Tutorial.ipynb",
      "provenance": [],
      "private_outputs": true,
      "collapsed_sections": []
    },
    "kernelspec": {
      "name": "python3",
      "display_name": "Python 3"
    }
  },
  "cells": [
    {
      "cell_type": "code",
      "metadata": {
        "id": "NO_4WhdJZ5Pm",
        "colab_type": "code",
        "cellView": "form",
        "colab": {}
      },
      "source": [
        "#@title Copyright 2020 Google LLC. Double-click here for license information.\n",
        "# Licensed under the Apache License, Version 2.0 (the \"License\");\n",
        "# you may not use this file except in compliance with the License.\n",
        "# You may obtain a copy of the License at\n",
        "#\n",
        "# https://www.apache.org/licenses/LICENSE-2.0\n",
        "#\n",
        "# Unless required by applicable law or agreed to in writing, software\n",
        "# distributed under the License is distributed on an \"AS IS\" BASIS,\n",
        "# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.\n",
        "# See the License for the specific language governing permissions and\n",
        "# limitations under the License."
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "DeKRcpe49VnV",
        "colab_type": "text"
      },
      "source": [
        "# Pandas DataFrame UltraQuick Tutorial\n",
        "\n",
        "This Colab introduces [**DataFrames**](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.html), which are the central data structure in the pandas API. This Colab is not a comprehensive DataFrames tutorial.  Rather, this Colab provides a very quick introduction to the parts of DataFrames required to do the other Colab exercises in Machine Learning Crash Course.\n",
        "\n",
        "A DataFrame is similar to an in-memory spreadsheet. Like a spreadsheet:\n",
        "\n",
        "  * A DataFrame stores data in cells. \n",
        "  * A DataFrame has named columns (usually) and numbered rows."
      ]
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "AByfHsr8H_sU",
        "colab_type": "text"
      },
      "source": [
        "## Import NumPy and pandas modules\n",
        "\n",
        "Run the following code cell to import the NumPy and pandas modules. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "ZmL0l551Iibq",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "import numpy as np\n",
        "import pandas as pd"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RutIK84wIp1S",
        "colab_type": "text"
      },
      "source": [
        "## Creating a DataFrame\n",
        "\n",
        "The following code cell creates a simple DataFrame containing 10 cells organized as follows:\n",
        "\n",
        "  * 5 rows\n",
        "  * 2 columns, one named `temperature` and the other named `activity`\n",
        "\n",
        "The following code cell instantiates a `pd.DataFrame` class to generate a DataFrame. The class takes two arguments:\n",
        "\n",
        "  * The first argument provides the data to populate the 10 cells. The code cell calls `np.array` to generate the 5x2 NumPy array.\n",
        "  * The second argument identifies the names of the two columns."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "FNZsPOgSD4F2",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create and populate a 5x2 NumPy array.\n",
        "my_data = np.array([[0, 3], [10, 7], [20, 9], [30, 14], [40, 15]])\n",
        "\n",
        "# Create a Python list that holds the names of the two columns.\n",
        "my_column_names = ['temperature', 'activity']\n",
        "\n",
        "# Create a DataFrame.\n",
        "my_dataframe = pd.DataFrame(data=my_data, columns=my_column_names)\n",
        "\n",
        "# Print the entire DataFrame\n",
        "print(my_dataframe)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "NJ-I78_7OFVs",
        "colab_type": "text"
      },
      "source": [
        "## Adding a new column to a DataFrame\n",
        "\n",
        "You may add a new column to an existing pandas DataFrame just by assigning values to a new column name. For example, the following code creates a third column named `adjusted` in `my_dataframe`: "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "JEBZyMdEOngx",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create a new column named adjusted.\n",
        "my_dataframe[\"adjusted\"] = my_dataframe[\"activity\"] + 2\n",
        "\n",
        "# Print the entire DataFrame\n",
        "print(my_dataframe)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "RJ2aziCR5th2",
        "colab_type": "text"
      },
      "source": [
        "## Specifying a subset of a DataFrame\n",
        "\n",
        "Pandas provide multiples ways to isolate specific rows, columns, slices or cells in a DataFrame. "
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "RIO91Fu65s6k",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "print(\"Rows #0, #1, and #2:\")\n",
        "print(my_dataframe.head(3), '\\n')\n",
        "\n",
        "print(\"Row #2:\")\n",
        "print(my_dataframe.iloc[[2]], '\\n')\n",
        "\n",
        "print(\"Rows #1, #2, and #3:\")\n",
        "print(my_dataframe[1:4], '\\n')\n",
        "\n",
        "print(\"Column 'temperature':\")\n",
        "print(my_dataframe['temperature'])"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "_cL_NxAdZzdS",
        "colab_type": "text"
      },
      "source": [
        "## Task 1: Create a DataFrame\n",
        "\n",
        "Do the following:\n",
        "\n",
        "  1. Create an 3x4 (3 rows x 4 columns) pandas DataFrame in which the columns are named `Eleanor`,  `Chidi`, `Tahani`, and `Jason`.  Populate each of the 12 cells in the DataFrame with a random integer between 0 and 100, inclusive.\n",
        "\n",
        "  2. Output the following:\n",
        "\n",
        "     * the entire DataFrame\n",
        "     * the value in the cell of row #1 of the `Eleanor` column\n",
        "\n",
        "  3. Create a fifth column named `Janet`, which is populated with the row-by-row sums of `Tahani` and `Jason`.\n",
        "\n",
        "To complete this task, it helps to know the NumPy basics covered in the NumPy UltraQuick Tutorial. \n"
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "cIJEv08DMSxj",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Write your code here."
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "dPmpVM_8IoBO",
        "colab_type": "code",
        "cellView": "form",
        "colab": {}
      },
      "source": [
        "#@title Double-click for a solution to Task 1.\n",
        "\n",
        "# Create a Python list that holds the names of the four columns.\n",
        "my_column_names = ['Eleanor', 'Chidi', 'Tahani', 'Jason']\n",
        "\n",
        "# Create a 3x4 numpy array, each cell populated with a random integer.\n",
        "my_data = np.random.randint(low=0, high=101, size=(3, 4))\n",
        "\n",
        "# Create a DataFrame.\n",
        "df = pd.DataFrame(data=my_data, columns=my_column_names)\n",
        "\n",
        "# Print the entire DataFrame\n",
        "print(df)\n",
        "\n",
        "# Print the value in row #1 of the Eleanor column.\n",
        "print(\"\\nSecond row of the Eleanor column: %d\\n\" % df['Eleanor'][1])\n",
        "\n",
        "# Create a column named Janet whose contents are the sum\n",
        "# of two other columns.\n",
        "df['Janet'] = df['Tahani'] + df['Jason']\n",
        "\n",
        "# Print the enhanced DataFrame\n",
        "print(df)"
      ],
      "execution_count": 0,
      "outputs": []
    },
    {
      "cell_type": "markdown",
      "metadata": {
        "id": "bh7MeyafemNL",
        "colab_type": "text"
      },
      "source": [
        "## Copying a DataFrame (optional)\n",
        "\n",
        "Pandas provides two different ways to duplicate a DataFrame:\n",
        "\n",
        "* **Referencing.** If you assign a DataFrame to a new variable, any change to the DataFrame or to the new variable will be reflected in the other. \n",
        "* **Copying.** If you call the `pd.DataFrame.copy` method, you create a true independent copy.  Changes to the original DataFrame or to the copy will not be reflected in the other. \n",
        "\n",
        "The difference is subtle, but important."
      ]
    },
    {
      "cell_type": "code",
      "metadata": {
        "id": "YDu2VotPgzsW",
        "colab_type": "code",
        "colab": {}
      },
      "source": [
        "# Create a reference by assigning my_dataframe to a new variable.\n",
        "print(\"Experiment with a reference:\")\n",
        "reference_to_df = df\n",
        "\n",
        "# Print the starting value of a particular cell.\n",
        "print(\"  Starting value of df: %d\" % df['Jason'][1])\n",
        "print(\"  Starting value of reference_to_df: %d\\n\" % reference_to_df['Jason'][1])\n",
        "\n",
        "# Modify a cell in df.\n",
        "df.at[1, 'Jason'] = df['Jason'][1] + 5\n",
        "print(\"  Updated df: %d\" % df['Jason'][1])\n",
        "print(\"  Updated reference_to_df: %d\\n\\n\" % reference_to_df['Jason'][1])\n",
        "\n",
        "# Create a true copy of my_dataframe\n",
        "print(\"Experiment with a true copy:\")\n",
        "copy_of_my_dataframe = my_dataframe.copy()\n",
        "\n",
        "# Print the starting value of a particular cell.\n",
        "print(\"  Starting value of my_dataframe: %d\" % my_dataframe['activity'][1])\n",
        "print(\"  Starting value of copy_of_my_dataframe: %d\\n\" % copy_of_my_dataframe['activity'][1])\n",
        "\n",
        "# Modify a cell in df.\n",
        "my_dataframe.at[1, 'activity'] = my_dataframe['activity'][1] + 3\n",
        "print(\"  Updated my_dataframe: %d\" % my_dataframe['activity'][1])\n",
        "print(\"  copy_of_my_dataframe does not get updated: %d\" % copy_of_my_dataframe['activity'][1])"
      ],
      "execution_count": 0,
      "outputs": []
    }
  ]
}
